在当前的数字化时代,在线支付系统吸引了相当大的兴趣。提高支付系统的效率很重要,因为它对企业的收入有很大影响。网关是每次交易都被路由的付款系统的一个组成部分。在在线支付系统中,付款处理器通过各种配置与这些网关集成,例如定价,方法,风险检查等。这些配置称为终端。每个网关都可以有多个与之相关的终端。通过最佳终端路由付款交易至关重要,以提高付款交易的概率成功。机器学习(ML)和人工智能(AI)技术可用于基于先前的性能和各种支付相关属性准确地预测最佳终端。我们设计了一种由静态和动态模块组成的管道。静态模块使用静态规则和预测网关下降时间的逻辑回归模型进行终端初始过滤。随后,动态模块基于成功率,支付属性,时间滞后等来计算大量的新颖功能以准确地模拟终端行为。使用反馈循环实时使用自适应时间衰减速率算法更新这些功能,并传递给随机林分类器以预测每个终端的成功概率。该管道目前正在razorpay在Razorpay提供数百万次交易中实时生产,并在所有支付方法(信用卡,借记卡,UPI,净银行)的成功率上有4-6 \%。这使得我们的支付系统更加适应表现下降,这已经提高了用户体验,灌输了更多信任商家,并提升了业务的收入。
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
translated by 谷歌翻译
We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a simple analytical model for inference efficiency to select the best multi-dimensional partitioning techniques optimized for TPU v4 slices based on the application requirements. We combine these with a suite of low-level optimizations to achieve a new Pareto frontier on the latency and model FLOPS utilization (MFU) tradeoffs on 500B+ parameter models that outperforms the FasterTransformer suite of benchmarks. We further show that with appropriate partitioning, the lower memory requirements of multiquery attention (i.e. multiple query heads share single key/value head) enables scaling up to 32x larger context lengths. Finally, we achieve a low-batch-size latency of 29ms per token during generation (using int8 weight quantization) and a 76% MFU during large-batch-size processing of input tokens, while supporting a long 2048-token context length on the PaLM 540B parameter model.
translated by 谷歌翻译
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
translated by 谷歌翻译
在文本中提取时间关系是自然语言理解的一个至关重要但充满挑战的问题。根据事件之间的距离,模型必须学会从事件对周围的本地和全局环境中进行不同的信息以进行时间关系预测。学习如何融合这些信息已证明对基于变压器的语言模型具有挑战性。因此,我们介绍了mulco:多尺度对比的共同训练,这是一种更好地融合本地和全球情境化特征的技术。我们的模型使用基于BERT的语言模型编码本地上下文和图形神经网络(GNN)来表示全局文档级句法和时间特征。与以前的最先进方法不同,该方法在多视图功能上使用简单的串联或使用复杂的强化学习方法选择最佳句子,我们的模型Co-Trains GNN和BERT模块使用多规模的对比度学习目标。 GNN和BERT模块通过将GNN多层多跳子图(即,全局上下文嵌入)和BERT输出(即局部上下文嵌入)进行对比,从而学习了协同参数化。我们从经验上证明,与当前的最新技术相比,Mulco提供了改进的使用Bert和GNN编码的本地和全球环境的能力。我们的实验结果表明,Mulco在几个时间关系提取数据集上实现了新的最新结果。
translated by 谷歌翻译
在本文中,我们分享了我们努力建立能够翻译一千多种语言的实用机器翻译(MT)系统的发现。我们在三个研究领域中描述了结果:(i)通过利用半监督预训练的语言识别和开发数据驱动的过滤技术来构建1500多种语言的清洁,网挖数据集; (ii)通过利用大规模的多语言模型来开发用于服务不足的语言的实用MT模型,该模型训练了有监督的并行数据,以使用100多种高资源语言和单语言数据集,以增加1000多种语言; (iii)研究这些语言的评估指标的局限性,并对我们MT模型的输出进行定性分析,突出显示了这些类型模型的几种频繁误差模式。我们希望我们的工作为旨在为当前研究的语言构建MT系统的从业者提供有用的见解,并突出显示可以补充Data-Sparse设置中大量多语言模型的弱点的研究方向。
translated by 谷歌翻译
我们提出了Maestro,这是一种自制的培训方法,可以统一从语音和文本方式中学到的表示形式。从语音信号中进行的自我监督学习旨在学习信号中固有的潜在结构,而从文本尝试捕获词汇信息的文本尝试中学习。从不配对的语音和文本序列中学习对齐表示是一项具有挑战性的任务。先前的工作要么隐含地强制执行从这两种方式中学到的表示形式,要通过多任务和参数共享在潜在空间中对齐,或通过语音综合通过模态转换而明确地进行。前者受到两种方式之间的干扰,而后者则引入了额外的复杂性。在本文中,我们提出了一种新颖的算法Maestro,旨在同时从这两种方式中学习统一的表示,可以转移到各种下游任务,例如自动语音识别(ASR)和语音翻译(ST)。 Maestro通过序列比对,持续时间预测和匹配的嵌入在学习空间中通过对齐的蒙版模型损失来学习统一的表示形式。我们在Voxpopuli多语言ASR上建立了一个新的最先进(SOTA),单词错误率相对相对降低8%(WER),多域Speetstew ASR(相对3.7%)和21种英语多语言ST在Covost 2上2.8 BLEU的改善平均21种语言。
translated by 谷歌翻译
大型语言模型已被证明可以使用少量学习来实现各种自然语言任务的出色表现,这大大减少了将模型调整到特定应用程序所需的特定任务培训示例的数量。为了进一步了解量表对少量学习的影响,我们培训了一个5400亿个参数,密集激活的变压器语言模型,我们称之为“途径”语言模型棕榈。我们使用Pathways在6144 TPU V4芯片上训练了Palm,这是一种新的ML系统,可在多个TPU POD上进行高效的训练。我们通过在数百种语言理解和产生基准的基准方面实现最先进的学习结果来证明扩展的持续好处。在这些任务中,Palm 540B实现了突破性的表现,在一系列多步推理任务上表现出色,超过了最新的最新表现,并且在最近发布的Big Benchmark上表现优于平均人类表现。大量的大型基础任务显示出与模型量表的不连续改进,这意味着当我们扩展到最大模型时,性能急剧增加。 Palm在多语言任务和源代码生成方面也具有很强的功能,我们在各种基准测试中证明了这一点。我们还提供了有关偏见和毒性的全面分析,并研究了训练数据记忆的程度,相对于模型量表。最后,我们讨论与大语言模型有关的道德考虑,并讨论潜在的缓解策略。
translated by 谷歌翻译
端到端的语音到语音翻译(S2ST)而不依赖中间文本表示是一个快速新兴的研究领域。最近的作品表明,这种直接S2ST系统的性能正在接近常规级联S2ST时,在可比较的数据集中进行了培训。但是,实际上,直接S2ST的性能受到配对S2ST培训数据的可用性。在这项工作中,我们探索了多种方法,用于利用更广泛的无监督和弱监督的语音和文本数据,以改善基于Translatotron 2的直接S2ST的性能2.使用我们最有效的方法,我们的最有效的方法是21号直接S2ST的平均翻译质量与没有其他数据的先前最新的训练相比,CVSS-C语料库上的语言对改善了+13.6 BLEU(OR +113%)。低资源语言的改进更加显着(平均+398%)。我们的比较研究表明,S2ST和语音表示学习的未来研究方向。
translated by 谷歌翻译